Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] mem leak fix #22863

Closed
wants to merge 4 commits into from
Closed

[TensorRT EP] mem leak fix #22863

wants to merge 4 commits into from

Conversation

chilo-ms
Copy link
Contributor

Fix wrong logic and mem leak

jywu-msft
jywu-msft previously approved these changes Nov 16, 2024
@@ -2413,7 +2413,7 @@ ORT_API(void, OrtApis::ReleaseTensorRTProviderOptions, _Frees_ptr_opt_ OrtTensor
delete[] ptr->trt_profile_opt_shapes;
delete[] ptr->trt_ep_context_file_path;
delete[] ptr->trt_onnx_model_folder_path;
if (ptr->trt_op_types_to_exclude) delete[] ptr->trt_op_types_to_exclude;
if (!ptr->trt_op_types_to_exclude && ptr->trt_op_types_to_exclude_str_is_dynamic_allocation) delete[] ptr->trt_op_types_to_exclude;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why !ptr

Copy link
Contributor Author

@chilo-ms chilo-ms Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we allow user to provide empty string or nullptr to trt_op_types_to_exclude option to override default value.

In OrtApis::UpdateTensorRTProviderOptions, the value of trt_op_types_to_exclude can be '\0' or nullptr, and later it calls TensorrtExecutionProviderInfo::UpdateProviderOptions and when handling the trt_op_types_to_exclude options with copy_string_if_needed(), the trt_op_types_to_exclude is assigned with nullptr.

so, here in the ReleaseTensorRTProviderOptions, we need to check that pointer is not nullptr, otherwise deleting a nulltpr leads to SIGABRT.

@sophies927 sophies927 added the triage:approved Approved for cherrypicks for release label Nov 16, 2024
@@ -360,6 +360,9 @@ void TensorrtExecutionProviderInfo::UpdateProviderOptions(void* provider_options
trt_provider_options_v2.trt_engine_hw_compatible = internal_options.engine_hw_compatible;
trt_provider_options_v2.trt_onnx_bytestream = internal_options.onnx_bytestream;
trt_provider_options_v2.trt_onnx_bytestream_size = internal_options.onnx_bytestream_size;
trt_provider_options_v2.trt_op_types_to_exclude = copy_string_if_needed(internal_options.op_types_to_exclude);
if (options.find("trt_op_types_to_exclude") != options.end()) {
trt_provider_options_v2.trt_op_types_to_exclude = copy_string_if_needed(internal_options.op_types_to_exclude);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_string_if_needed() doesn't always dynamically allocate memory right? only if string_copy is true?
this is_dynamic_allocation check looks ugly.

Copy link
Contributor Author

@chilo-ms chilo-ms Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can add a check that only if string_copy is true, we set is_dynamic_allocation to true.

yeah, agree the is_dynamic_allocation is ugly. But i can't think of a better solution without having a flag.
The reason is options_v2.trt_op_types_to_exclude which is a const char* can be static allocation or dynamic allocation.

  • static allocation - When instantiate an OrtTensorRTProviderOptionsV2 instance, the trt_op_types_to_exclude is static allocation.
  • dynamic allocation - When user calls OrtApis::UpdateTensorRTProviderOptions and specify other value to trt_op_types_to_exclude, then it's dynamic allocation.

So, once ReleaseTensorRTProviderOptions is being called, it's hard to tell that trt_op_types_to_exclude is static allocation or dynamic allocation. (Note: We can't delete a static allocation buffer). That's why we might need a flag here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal file. Why don't we just use a std::string? Then there will be no need to manually delete[] buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We later discussed we will close this PR due to ugly handling of the buffers.

if (options.find("trt_op_types_to_exclude") != options.end()) {
trt_provider_options_v2.trt_op_types_to_exclude = copy_string_if_needed(internal_options.op_types_to_exclude);
trt_provider_options_v2.trt_op_types_to_exclude_str_is_dynamic_allocation = 1;
}
Copy link
Member

@jywu-msft jywu-msft Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we dynamically allocate the ""NonMaxSuppression,NonZero,RoiAlign" string in the else case here if we don't find trt_op_types_to_exclude in options. then we would always need to deallocate it.

Copy link
Contributor Author

@chilo-ms chilo-ms Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we dynamically allocate the ""NonMaxSuppression,NonZero,RoiAlign" string in the else case here if we don't find trt_op_types_to_exclude in options.

i assume you mean the default value "NonMaxSuppression,NonZero,RoiAlign" of trt_op_types_to_exclude, it's static allocation, no need to deallocate. it will be deallocated automatically by the program.

Otherwise, If user uses OrtApis::UpdateTensorRTProviderOptions to update the OrtTensorRTProviderOptionsV2 instance, i don't think the case you mention exist.

if user don't use our API and directly update trt_op_types_to_exclude in the OrtTensorRTProviderOptionsV2 instance, it's user's responsibility to deallocate the buffer.

@sophies927 sophies927 removed triage:approved Approved for cherrypicks for release release:1.20.1 labels Nov 18, 2024
yf711 pushed a commit that referenced this pull request Nov 18, 2024
TRT EP excludes DDS ops from running on TRT and doesn't allow users to
change.
This PR is for ORT 1.20.1 patch release.

We will have better solution to add a new provider option to exclude
specific ops, similar to following:
#22863
#22681
yf711 pushed a commit that referenced this pull request Nov 18, 2024
TRT EP excludes DDS ops from running on TRT and doesn't allow users to
change.
This PR is for ORT 1.20.1 patch release.

We will have better solution to add a new provider option to exclude
specific ops, similar to following:
#22863
#22681
@chilo-ms chilo-ms closed this Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants